智能论文笔记

Spatio-temporal Relation Modeling for Few-shot Action Recognition

Anirudh Thatipelli , Sanath Narayan , Salman Khan , Rao Muhammad Anwer , Fahad Shahbaz Khan , Bernard Ghanem

分类：计算机视觉

2021-12-09

我们提出了一种新颖的少量射击动作识别框架，它增强了特定于类特征的特征歧视性，同时学习高阶时间表示。我们的方法的重点是一种新的时空浓缩模块，可以使用专用的本地补丁级别和全局帧级别富集子模块聚合空间和时间上下文。本地补丁级别的浓缩捕获了基于外观的动作特征。另一方面，全局帧级富集明确地编码了广泛的时间上下文，从而随着时间的推移捕获相关对象特征。然后利用产生的时空富集的表示来学习查询和支持动作子序列之间的关系匹配。我们在补丁级丰富的功能上进一步引入了查询类相似性分类器，通过在所提出的框架中加强特征学习来增强特定于类的特征歧视性。实验是在四次拍摄动作识别基准测试中执行：动力学，SSV2，HMDB51和UCF101。我们广泛的消融研究揭示了拟议贡献的好处。此外，我们的方法在所有四个基准上设置了一种新的最先进的。在挑战SSV2基准测试中，与文献中的最佳现有方法相比，我们的方法在分类准确性中实现了3.5％的绝对增益。我们的代码和型号将公开发布。

translated by 谷歌翻译

OW-DETR: Open-world Detection Transformer

Akshita Gupta , Sanath Narayan , K J Joseph , Salman Khan , Fahad Shahbaz Khan , Mubarak Shah

分类：计算机视觉

2021-12-02

开放世界对象检测（OWOD）是一个具有挑战性的计算机视觉问题，其中任务是检测一组已知的对象类别，同时识别未知对象。此外，该模型必须逐步学习在下一个培训集中所知的新类。不同于标准对象检测，OWOD设置会对在潜在的未知物体上生成质量候选建议的质量挑战，将未知物体与背景中的未知物体分开并检测不同的未知物体。在这里，我们介绍了一种新的基于端到端的变换器的框架OW-DETR，用于开放世界对象检测。建议的OW-DETR包括三个专用组成部分，即注意力驱动的伪标签，新颖性分类和对象评分，以明确地解决上述OWOD挑战。我们的OW-DETR明确地编码了多尺度上下文信息，具有较少的归纳偏差，使得从已知类传输到未知类，并且可以更好地区分未知对象和背景之间。综合实验是对两个基准进行的：MS-Coco和Pascal VOC。广泛的消融揭示了我们拟议的贡献的优点。此外，我们的模型优于最近引入的OWOD方法矿石，绝对增益在MS-Coco基准测试中的未知召回方面的1.8％至3.3％。在增量对象检测的情况下，OW-DETR以Pascal VOC基准上的所有设置优于最先进的。我们的代码和模型将公开发布。

translated by 谷歌翻译

On the Interpretability of Attention Networks

Lakshmi Narayan Pandey , Rahul Vashisht , Harish G. Ramaswamy

分类：机器学习

2022-12-30

Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.

translated by 谷歌翻译

mFACE: Multilingual Summarization with Factual Consistency Evaluation

Roee Aharoni , Shashi Narayan , Joshua Maynez , Jonathan Herzig , Elizabeth Clark , Mirella Lapata

分类：自然语言处理

2022-12-20

Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets. Despite promising results, current models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. Several recent efforts attempt to address this by devising models that automatically detect factual inconsistencies in machine generated summaries. However, they focus exclusively on English, a language with abundant resources. In this work, we leverage factual consistency evaluation models to improve multilingual summarization. We explore two intuitive approaches to mitigate hallucinations based on the signal provided by a multilingual NLI model, namely data filtering and controlled generation. Experimental results in the 45 languages from the XLSum dataset show gains over strong baselines in both automatic and human evaluation.

translated by 谷歌翻译

Little Red Riding Hood Goes Around the Globe:Crosslingual Story Planning and Generation with Large Language Models

Evgeniia Razumovskaia , Joshua Maynez , Annie Louis , Mirella Lapata , Shashi Narayan

分类：自然语言处理

2022-12-20

We consider the problem of automatically generating stories in multiple languages. Compared to prior work in monolingual story generation, crosslingual story generation allows for more universal research on story planning. We propose to use Prompting Large Language Models with Plans to study which plan is optimal for story generation. We consider 4 types of plans and systematically analyse how the outputs differ for different planning strategies. The study demonstrates that formulating the plans as question-answer pairs leads to more coherent generated stories while the plan gives more control to the story creators.

translated by 谷歌翻译

Accu-Help: A Machine Learning based Smart Healthcare Framework for Accurate Detection of Obsessive Compulsive Disorder

Kabita Patel , Ajaya Kumar Tripathy , Laxmi Narayan Padhy , Sujita Kumar Kar , Susanta Kumar Padhy , Saraju Prasad Mohanty

分类：机器学习

2022-12-05

In recent years the importance of Smart Healthcare cannot be overstated. The current work proposed to expand the state-of-art of smart healthcare in integrating solutions for Obsessive Compulsive Disorder (OCD). Identification of OCD from oxidative stress biomarkers (OSBs) using machine learning is an important development in the study of OCD. However, this process involves the collection of OCD class labels from hospitals, collection of corresponding OSBs from biochemical laboratories, integrated and labeled dataset creation, use of suitable machine learning algorithm for designing OCD prediction model, and making these prediction models available for different biochemical laboratories for OCD prediction for unlabeled OSBs. Further, from time to time, with significant growth in the volume of the dataset with labeled samples, redesigning the prediction model is required for further use. The whole process requires distributed data collection, data integration, coordination between the hospital and biochemical laboratory, dynamic machine learning OCD prediction mode design using a suitable machine learning algorithm, and making the machine learning model available for the biochemical laboratories. Keeping all these things in mind, Accu-Help a fully automated, smart, and accurate OCD detection conceptual model is proposed to help the biochemical laboratories for efficient detection of OCD from OSBs. OSBs are classified into three classes: Healthy Individual (HI), OCD Affected Individual (OAI), and Genetically Affected Individual (GAI). The main component of this proposed framework is the machine learning OCD prediction model design. In this Accu-Help, a neural network-based approach is presented with an OCD prediction accuracy of 86 percent.

translated by 谷歌翻译

Multi-Robot Coordination and Cooperation with Task Precedence Relationships

Walker Gosrich , Siddharth Mayya , Saaketh Narayan , Matthew Malencia , Saurav Agarwal , Vijay Kumar

分类：机器人

2022-09-28

我们为多机器人任务计划和分配问题提出了一种新的公式，该公式结合了（a）任务之间的优先关系；（b）任务的协调，允许多个机器人提高效率；（c）通过形成机器人联盟的任务合作，而单独的机器人不能执行。在我们的公式中，任务图指定任务和任务之间的关系。我们在任务图的节点和边缘上定义了一组奖励函数。这些功能对机器人联盟规模对任务绩效的影响进行建模，并结合一个任务的性能对依赖任务的影响。最佳解决此问题是NP-HARD。但是，使用任务图公式使我们能够利用最小成本的网络流量方法有效地获得近似解决方案。此外，我们还探索了一种混合整数编程方法，该方法为问题的小实例提供了最佳的解决方案，但计算上很昂贵。我们还开发了一种贪婪的启发式算法作为基准。我们的建模和解决方案方法导致任务计划，即使在与许多代理商的大型任务中，也利用任务优先关系的关系以及机器人的协调和合作来实现高级任务绩效。

translated by 谷歌翻译

Unsupervised Early Exit in DNNs with Multiple Exits

Hari Narayan N U , Manjesh K. Hanawal , Avinash Bhardwaj

分类：机器学习 | 人工智能 | 自然语言处理

2022-09-20

深神经网络（DNN）通常被设计为依次级联的可区分块/层，其预测模块仅连接到其最后一层。 DNN可以与沿主链的多个点的预测模块相连，其中推理可以在中间阶段停止而无需通过所有模块。最后一个退出点可能会提供更好的预测错误，但还涉及更多的计算资源和延迟。就预测误差和成本而言，一个“最佳”的出口是可取的。最佳出口点可能取决于任务的潜在分布，并且可能会从一个任务类型变为另一种任务类型。在神经推断期间，实例的基础真理可能无法获得，并且每个出口点的错误率无法估算。因此，人们面临在无监督环境中选择最佳出口的问题。先前的工作在离线监督设置中解决了此问题，假设可以使用足够的标记数据来估计每个出口点的错误率并调整参数以提高准确性。但是，经过预训练的DNN通常被部署在新领域中，可能无法提供大量的地面真相。我们将退出选择的问题建模为无监督的在线学习问题，并使用匪徒理论来识别最佳出口点。具体而言，我们专注于弹性BERT，这是一种预先训练的多EXIT DNN，以证明它“几乎”满足了强大的优势（SD）属性，从而可以在不知道地面真相标签的情况下学习在线设置中的最佳出口。我们开发了名为UEE-UCB的基于上限（UCB）的上限（UCB）算法，该算法可证明在SD属性下实现了子线性后悔。因此，我们的方法提供了一种自适应学习多种exit DNN中特定于域特异性的最佳出口点的方法。我们从IMDB和Yelp数据集上进行了验证算法验证我们的算法。

translated by 谷歌翻译

DeePhy: On Deepfake Phylogeny

Kartik Narayan , Harsh Agarwal , Kartik Thakral , Surbhi Mittal , Mayank Vatsa , Richa Singh

分类：计算机视觉

2022-09-19

DeepFake是指量身定制和合成生成的视频，这些视频现在普遍存在并大规模传播，威胁到在线可用信息的可信度。尽管现有的数据集包含不同类型的深击，但它们的生成技术各不相同，但它们并不考虑以“系统发育”方式进展。现有的深层面孔可能与另一个脸交换。可以多次执行面部交换过程，并且可以演变出最终的深层效果，以使DeepFake检测算法混淆。此外，许多数据库不提供应用的生成模型作为目标标签。模型归因通过提供有关所使用的生成模型的信息，有助于增强检测结果的解释性。为了使研究界能够解决这些问题，本文提出了Deephy，这是一种新型的DeepFake系统发育数据集，由使用三种不同的一代技术生成的5040个DeepFake视频组成。有840个曾经交换深击的视频，2520个换两次交换深击的视频和1680个换装深击的视频。使用超过30 GB的大小，使用1,352 GB累积内存的18 GPU在1100多个小时内准备了数据库。我们还使用六种DeepFake检测算法在Deephy数据集上展示了基准。结果突出了需要发展深击模型归因的研究，并将过程推广到各种深层生成技术上。该数据库可在以下网址获得：http：//iab-rubric.org/deephy-database

translated by 谷歌翻译

Robust Artificial Delay based Impedance Control of Robotic Manipulators with Uncertain Dynamics

Udayan Banerjee , Bhabani Shankar Dey , Indra Narayan Kar , Subir Kumar Saha

分类：机器人

2022-08-18

在本文中，提出了针对动力学不确定性的机器人操纵器提出的人工延迟阻抗控制器。控制定律将超级扭曲算法（STA）类型的二阶切换控制器通过新颖的广义过滤跟踪误差（GFTE）统一延迟估计（TDE）框架。虽然时间延迟的估计框架可以通过估算不确定的机器人动力学和相互作用力来从状态和控制工作的近期数据中估算不确定的机器人动力学和相互作用力来准确建模机器人动力学，但外部循环中的第二阶切换控制法可以在时间延迟估计的情况下提供稳健性（TDE）由于操纵器动力学的近似而引起的误差。因此，拟议的控制定律试图在机器人最终效应变量之间建立所需的阻抗模型，即在存在不确定性的情况下，在遇到平滑接触力和自由运动期间的力和运动。使用拟议的控制器以及收敛分析的两个链接操纵器的仿真结果显示出验证命题。

translated by 谷歌翻译